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^ ■ Abstract 

O 

o 



The categorization ability of fully connected neural network models, with 
either discrete or continuous Q-state units, is studied in this work in replica 



> 

o . 

f^ , symmetric mean-field theory. Hierarchically correlated multi-state patterns in 



a two level structure of ancestors and descendents (examples) are embedded in 
the network and the categorization task consists in recognizing the ancestors 
when the network is trained exclusively with their descendents. Explicit re- 



'^ ' suits for the dependence of the equilibrium properties of a Q = 3-state model 



and a, Q = oo-state model are obtained in the form of phase diagrams and 
categorization curves. A strong improvement of the categorization ability is 



H ' found when the network is trained with examples of low activity. The cate- 

gorization ability is found to be robust to finite threshold and synaptic noise. 

The Almeida-Thouless lines that limit the validity of the replica-symmetric 

results, are also obtained. 
87.10.-he; 64.60.Cn 
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I. INTRODUCTION 

Multi-state attractor neural networks in which the units (neurons) can be in more than two 
states are, in general, more flexible and efficient biological or artificial devices than networks of 
binary units. Much work has been done over some time on the retrieval problem in multi-state 
networks of various architectures, with either simple or hierarchical patterns in more than two 
states. The retrieval problem consists in the recognition of patterns that have been stored in a 
network by means of a learning (or training) rule, when the network is set in an appropriate initial 
state to start its operating stage |T|. Thus, the retrieval problem deals with the memorization ability 
of a network. The networks that have been considered are the dilute, the layered feed-forward and 
the fully connected networks ||2|-[lq|. 

More recently, some work has been done on the categorization problem in multi-state attractor 



networks [^^-^, following extensive studies of the problem in binary networks |22-31]. The 
categorization problem consists in the spontaneous recognition of a level of hierarchical patterns 
other than those stored in the training process of a network [22,23|. The problem deals with the 



ability to create a representation for concepts when the network is only exposed to examples in 
the training stage. 

Some of the questions that one may ask are the following. First, one is interested in the minimal 
structure of the training patterns, and their number, in order to achieve a satisfactory recognition 
of a macroscopic number of hierarchically related ancestors. Second, one would like to know the 
recognition rate (number of patterns per neuron) of these ancestors and how stable they are as 
attractors of the network dynamics. The recognition quality is of primary interest and one may 
also want to check on the robustness of the recognition process to various kinds of noise. 

The simplest, and most studied case of the categorization problem, consists in the recognition 
of ancestors of a two-level hierarchy of ancestors and descendents trained only with the latter ac- 
cording to a specific learning rule. The hierarchical patterns that are generated trough a stochastic 



procedure |32| lead to correlations between patterns in different levels as well as correlations be- 



tween patterns in the same level [p^]. As a consequence, there is a complex structure of attractors 



in a network with hierarchical patterns in which the attractors may neither coincide with the train- 
ing patterns nor with the ancestors, and it is of interest to study under what conditions the latter 
become stable attractors. 

Patterns in more than two states, which represent a gradual coding, may have a low activity 
which is biologically appealing. Moreover, "small" patterns, in which a number of bits have been 
turned off, are patterns of low activity that can infer patterns of full size and thereby enhance the 
performance of a multi-state network, as demonstrated explicitly in works on both the retrieval 
problem [g,^ and the categorization problem. The dynamics of the latter has been studied in an 
extremely dilute asymmetric three-state network with a monotonic neuron firing function and a 
generalized Hebbian learning rule [^. The extremely dilute network requires a vanishingly small 
connectivity between neurons in order to allow for an exact solution of the network dynamics, and 
one may ask what the behavior would be for a network with full connectivity. 

The purpose of the present paper is to answer some of the questions raised above investigating 
the equilibrium, statistical mechanics behavior for the categorization problem in a fully connected 
multi-state network with hierarchical patterns of low activity in a two-level hierarchy of ances- 
tors and descendents. Our aim is to obtain the phase diagrams that describe the various regimes 
of performance of the network in terms of the relevant parameters: the activity of the training 
patterns, the dynamical activity of the firing units, the correlation between ancestors and descen- 
dents, the number of descendents, the multi-state threshold and the synaptic noise level, assuming 
a fixed activity of the ancestors. The quality of the performance of the network is described by 
so-called categorization curves that express the dependence of the categorization error on some of 
the parameters of the model. 

Since it is known from the results on the retrieval problem in a Q-state network that the relevant 
phase diagrams become increasingly complex as one goes from the three to the four-state model [^, 
we consider a Q = 3-state model and a Q = oo-state (graded response) model. We make use of a 
generalized Hebbian learning rule that has been used before [^Jl^. The outline of the paper is the 
following. In section II we present the general Q Ising-state model for the categorization problem. 
The mean-field theory for that model is summarized in section III. In section IV we present and 



discuss the results for Q = 3, in the absence or presence of synaptic noise and in section V we 
present the results for the Q = oo-state model. We conclude in section VI with a summary of the 
results. 

II. THE MODEL 

Consider a network of N nodes, i = \,. . . ,N . At the time step f, the state of the node i is 
described by the variable Sj(t), that can be in any one of the Q Ising states 

2ik-l) 

in the interval [-1,1], for /c = 1, . . . , Q. The task to be performed by the network is the recognition 
of a macroscopic set of p concepts {^f ; /i = l,...,p;i = l,..., A^}, with p = aN, where a is finite. 
During the learning stage, only a set of s "small" examples {^f; fj. = 1, . . . ,p; p = 1, . . . ,s] i = 
1,...,A^} of each concept are presented to the network. By "small" examples we mean that 
a macroscopic number of bits in each example are turned off. The concepts are assumed to be 
independent identically distributed random variables with zero mean and variance A. The examples 
^j^^ of the concept ^f are generated through a stochastic process based on an appropriate probability 
distribution P{Xf^), given below, such that 

er = efAr . (2) 

The properties of the distribution P(Af ) will be chosen in accordance with the states of the 
neurons, Eq. (|l|). For finite Q = 3, say A^'' assumes the values +1, or —1 depending, respectively, 
on the example ^f being either in agreement with the concept ^f, being turned off, or opposite to 
the concept at the site i. In the case of continuous neurons, i. e., Q — > oo, we assume that Xf^ is 
a continuous variable in the interval [—1,1]. In either case, we assume that Af belongs to a set of 
independent random microscopic activities with mean 

(AD = b (3) 

and variance 



(Af^Af ) = [62 + (a - b') 6p^] 5,,6^, (4) 

with 6^ < a < 1. The symbol 5 represents the Kronecker delta. In consequence, we have the 
following relations, 

iCn^) = (K'^^^j) = bA6.j^,. (5) 

and 

(erer> = (Ar AfCCP = [b' + (« - b')6,^]A6,,d,, . (6) 

The mean activity of the examples becomes 

1 ^ 

^E(er)' = «^> (7) 

i 

for every /u and p. According to Eqs. (^) and (^), b is the correlation between an example and 
the concept to which it belongs. The pure multi-state model [Q can be obtained by taking the 
number of examples s = 1, the activity a = 1 and the correlation 6 = 1. Since a < 1, the activity 
of the examples is not greater than the activity of the concepts. In this sense, we refer to "small" 
examples, with the effective "size" of the patterns being N^ = aN. In this model, the view point 
is that the small examples are samples of the full-activity concepts to be inferred. 

In this work we are interested in the capacity of the network to infer only large concepts of 
full activity from the set of examples and restrict ourselves, therefore, to binary concepts, ^f = ±1 
with equal probability, that is to say, to the case A = 1. This task is considered to be successful if 
the categorization overlap 

1 ^ 
m, = j;^T.^'S^ (8) 

between the concept {^^} and the network state {Si} approaches unity after the network has 
reached the equilibrium state. To quantify the performance of the network, we define the catego- 
rization error for the concept // as 

6^ = ^(l-m^), (9) 



Thus, e^ should be small in the categorization phase and 0.5 in the disordered phase. 

Next we pass to the discussion the dynamics of the model, following the steps of ref. [Q and 
references therein. For a given configuration {Si} of the network, the local field hi on site i is 

hi{{Si})=Y,JijSj, (10) 

where the synapses Jij are constructed from the examples, according to the modified Hebb rule 

J^J = ^ E E CCr for z / j, J.. = 0. (11) 

/t=l (7=1 

The state of each site is updated asynchronously according to a Glauber (single spin-flip) dynamics 
in which the transition probabilities are given by 

P{S,it + At) = a,\{Sm) =^^2feM!2MM)L, (12) 

where /? = 1/T is the inverse temperature and the single site energy, ej{s\h), is given by 

ej{s\h) = -hs + es'^. (13) 

Here, ^ is a non-negative constant that favors local states of small dynamical activity. In the 
absence of stochastic noise, the deterministic evolution of the system is ruled by 

Sj{t + At) = dyn{hj{t)), (14) 

where dyn(x) is the non-decreasing step function, for finite Q, 

Q 
dyn(x) = Y^ crfc[G(6'((Tfc+i + ak) - x) - @{0{ak + crfc_i) - x)] (15) 

fc=i 

with ctq = — oo and ctq+i = -|-cxd, in which B(x) = 1, if x > and otherwise. The spin on site j 

assumes the state a^ given by Eq. (|l|) if the local field hj is bound by (Tk + cff^^i < hj/9 < (T^ + crfc+i. 

The width of the intermediate states with constant ak for 1 < k < Q (that is, excluding the limiting 

values of a^ = ±1), is given by ^0 /{Q — 1). Thus, the width of the zero state for the three-state 



network studied below is 20. In the limit Q — > oo, the input-output function, Eq. (15), becomes 
the piecewise linear function 



dyn(x) = sign(x)mm 



X 

26 



1 , (16) 



where min(x, y) means the niinimuin between x and y. The slope of the hnear part in here is 1/20, 
which is the gain parameter of the continuous network. The equihbrium thermodynamic properties 
of the fuhy connected infinite network that follows from the above dynamics is described by the 
Hamiltonian 

H = -Y,.h,S,S, + eY,SJ, (17) 

fo') i 

where the first sum is over all distinct pairs {ij). 

The relevant order parameters, when the network is in the ordered sub-space of the phase space, 

are the retrieval overlaps 

1 ^ 
^,f> = J^T.^fS. (18) 

i 

between the actual state of the network and each one of the examples p of each concept p.. The un- 
derlying idea in studying the categorization performance of the network is that when the number of 
correlated examples is higher than a critical value, for a given correlation strength, single examples 
are no longer local minima of the free-energy, but a mixed state having macroscopic symmetric 
overlap m^p = m^, for p = 1, . . . , s, with all the examples of a given concept /i, becomes a mini- 
mum. This state characterizes the categorization phase, and it yields a finite, macroscopic, overlap 
m^ with concept p. Since we are interested mainly in the categorization ability of the network, we 
restrict ourselves, in what follows, to the study of configurations that have a macroscopic overlap 
of order 0(1) with a mixture of a finite number s of examples of a given concept. Noting that the 
concepts are uncorrelated, one may concentrate on the overlap with anyone of them, say mi for 
p=l. 

III. MEAN-FIELD THEORY 

The free-energy per site follows as 

/(/?) = -jim^^((ln^(/3)){...|\^^^, (19) 

6 



with the averages over examples and concepts in that order, as indicated, where Z{P) is the 
canonical partition function 



Z{(3) = Y^ ew{-(3H) . 

{s^} 

In order to average over the quenched disorder, we employ the replica method, in which 

1 



(20) 



(i-^(/5)){A''n)|,.r S^n;;l(^^"(^)){A-}; -'' • 



(21) 



Using the generalized Hebb learning rule, Eq. (|ll|), and introducing a field hi, in order to generate 
an equation for the overlap ttt-i, the Hamiltonian Eq. (|l^), for the replica a, becomes 



1 



^'^ = -^ E E Cc^'sts^ + 9 J2isff -hiY: ClSf 



2N 



(22) 



i¥=j ^^P 



Introducing this expression in Eq. (20), separating the first concept, we linearize the quadratic 
terms and obtain the replicated partition function 



an^(/3)){A. 



^/^dmf^ 
11 /^ exp 



^«''> J ap V27r 



m 



a \2 



ap 



^ ((exp(/3pGn)) 



{sn 



(An/i^n 



exp</3^ 



J2m%Xl''C}S: 



]riT.(>^l'ClSff-9{Sff + hi^}Sf 



2N 



(23) 



^^">' len 



where 



P 



expinPpG) = n ^-P ^ E E S?S^ E Af Af ^^^ 



P>i 



involves the uncondensed examples. 



2N 



(24) 



i^j a 



{A-}' ia 



In the thermodynamic limit, A^ ^ cxo we obtain following ref. [35|, 

iPG = -itrln (l - /37iQ) - ^{s - l)trln (l - /372Q) - ^/?7i trQ - ^/?(s - 1)72 trQ, (25) 



where 



7i = a + (s — 1)6 , and 72 = a — 6 



(26) 



Here, Q is a matrix in the space of replicas with elements given by 



Qab = 



N^ ' ' 



StSt = qab if a/ 6, 



(27) 



and 



^aa — n;a 



(28) 



Thus, Qab is the spin-glass order parameter and Qa is the dynamical activity of the network. 
Whereas for the binary network in the replica-symmetric theory Qa = 1 , in the case of multi-state 
networks one has, in general, that Qafe < Qa < 1- 

Introducing, as usual, the overlap parameter r^f, associated to the correlation between the 
overlaps of the examples and concepts that do not condense, and restricting our study to the 
replica-symmetric solution, in which 



Qab = q 
Qa = an 
rab=r, 
we obtain that the replica-symmetric free-energy per site can be re-written as 



(29) 



m = lt<+^ 



p=i 



ln(l - 7iC7) + -^^ + (s - 1) ln(l - 72C) + {s - 1) ^^^ 



a 

+ 2 



^ + (s-l) 



{1-jicy 



(1-72^)2 



l-7iC7 ^ ' ' '- / ' V 'I-72C 
i((/p.lnEe^--) ) , 



(30) 



iS} ' {Alp}' {^1} 

where Vz = dzexp(— z2/2)/v2vr is a Gaussian measure. The effective Hamiltonian, Tieff, is given 



by 



We// = sr£ m^pX^^e - o's + h^e 



arz I 



(31) 



where 



"71 



(s-l): 



072 



(32) 



2(1 -7iC) ^" ^^2(1-72C)' 
is an effective width of the intermediate states, as wil be seen below (see Eq. (|50[)). Eventually, 
depending on the state of the network specified by the dynamical activity ao and the spin-glass 



8 



order parameter q, 9' may become negative, favoring an order with large absolute values for S. 
Although Eqs. (^)-(|3^) follow from the assumption of replica symmetry, we believe that such an 
order will exist, in general, albeit in a small region of the phase space. Here, C = (3{aD — q) = 
PJ2i{{Si) — {Si)'^)/N represents the susceptibility of the network. The parameter r is given by the 
algebraic saddle-point equation 

-- * +(^-l)7r-%w. (33) 



The remaining saddle-point equations determining the order parameters are 



x'^e hz{s{z))) ) 



mip = {{\'Pe -DziSiz))) , (34) 



Vz{S{z)f) ) (35) 



and 



C = ^U Vzz{S{z))) . (36) 

In the above equations, 

(5"(z)) = ^jf^ .^ . (37) 

The susceptibility C remains finite, so that at zero temperature, that is /? — > oo, q ^>- ud, while at 
finite temperature we have in general q < an- The overlap with the first concept, which measures 
the categorization ability, is given by 



df 
mi 



e fvz{S{z))) ) . (38) 



dhi \\ J /{Ai^l/i^i} 

Performing the configurational average in the saddle-point equations, we obtain 



for the symmetric overlap. 



q = JvzSl{hs,e') (40) 



and 

C=^JvzzSp{hs,e'), (41) 

as well as the overlap with the concept, 

mi= [vzSp{hs,e'). (42) 



The effective transfer function Sp{hs.,6') is given by 



^^ ' ie/3^' + cosh(/3/i,) ^ ' 



in the case of the three-state network, and 



*^^^^^' ^ 20'^ V:3F^ erf[-0+(/i„0')]-erf[-</.-(/is,0')] ' ^ ^ 



where 



cP±{h.,e') = ^/W{i + ^) (45) 

for (5 — > oo. Thus, 1/20' is the effective gain parameter for the continuous network. The effective 
field for the symmetric solution, hs, is given by 

hs = srusb + z^/v (46) 

where 

V = ar + smgj2 ■ (47) 



The first term in Eq. ( |4q ) is a signal term, while the second term is the Gaussian noise due to the 
macroscopic number of uncondensed examples and the presence of the symmetric mixture states. 
The latter, in which 72 = a — 6^ (cf. Eq. (^)), is reduced in the case of examples of low activity, 
a < 1. One should expect, thus, an enhancement of the categorization ability of the network in 
that case. The above equations are obtained under the assumption that the number of examples 



s is large, so that the average over examples is given by a Gaussian distribution [19,23]. In the 



10 



following sections we discuss the results based on the solutions of the saddle-point equations for 
both, the three-state and the continuous network. 

The limit of stability of the replica-symmetric solution comes from the study of quadratic 
fluctuations of the free-energy in the vicinity of the symmetric saddle-point. Following the Almeida 



^' +{s-l)—^^\a(i^lllvz\{S\z))-{S{z))^Y) ) <1. (48) 



and Thouless (AT) analysis |36|, we obtain 

as the stability condition for the replica-symmetric solution. 

IV. THREE-STATE NETWORK 

A. Categorization properties at zero temperature 

We begin by discussing the results for the categorization performance in three-state networks 
in the absence of retrieval noise. The probability distribution in this case is given by 

p(Ar) = ^5(Ar - 1) + (1 - a)5(Ar) + ^<5(Ar + 1) , (49) 

satisfying the conditions (^) and (Q). Thus, the example ^f has a probability {a+h)/2 to be aligned 
with the concept, while it has a probability 1 — a to be turned off and a probability (a — 6)/2 to 
be opposed to the concept. 

The effective transfer function, Eq. (|4^), at zero temperature becomes 

Soo{hs, e') = lim Sp{hs, e') = sgn{hs)Q{\h,\ - 6') . (50) 

From Eq. (|32|), we see that 9' may become negative. Since Soo{hs,9' < 0) is algebraically the 
same as Soo{hs,0' = 0) the network acts, in this case, as a binary network at T = 0. Accordingly, 
Eq. ( |39[ ) remains unchanged, while Eqs. (^) and ( pT[) become 

2 \ V2^ J 2 \ ^ J ^ ^ 

and 

11 



C = -j== exp 



{sm,b + e'e{6')) 



/^^2 



2v 
The overlap with the concept, Eq. (42), is given by 



1 

+ ^^=exp 



f\\2 



{srush - e'Q {e')) 

27 



(52) 



1 ( srnsb+fQ{ff)\ I (smsb-fOie^ 



We show in Fig. || the categorization phase-diagram for the case where s = 20, a = 0.2 = 6. 
With the choice that a = b, we are looking in a way for an optimal phase diagram in the sense 
that the training examples either coincide with the corresponding concepts, that is ^f^ = ^^, for 
p = 1,... ,s, or are zero, but they are never opposed to the concept. For other values of the 
parameters, similar diagrams are obtained, although with lower capacity a. The categorization 
phase (C), characterized by mi ^ and q ^ is globally stable below the heavy solid line. 
It becomes only locally stable, while the spin-glass (SG) phase is globally stable, between the 
heavy solid and the light solid line, where the system always jumps discontinuously to the spin- 
glass phase. This is in distinction with known results for the categorization phase diagram in the 



dilute network [19|, where the transition to the spin glass phase is partly continuous and partly 
discontinuous. Above the light solid line, and at the left of the dash-dotted line, where it disappears 
continuously, the spin-glass phase, with rrii = and q ^ 0, is stable. At the right of the dash- 
dotted line the paramagnetic (P), or zero-phase, with mi = and g = 0, is stable. Note that, for 
large threshold 6, there is a direct transition from the categorization phase to the fully disordered 
P phase, at low a. There exists also a retrieval phase of examples, without categorization, not 
shown in the figure. Since we are dealing with a large number of examples (thus favoring the 
categorization), that phase is present in the phase diagram only at very small values of a and 
6. To the left of the dotted line, the effective width 9' is negative. Here, every non-zero value of 
the local field is sufficient to access the neural states Si = ±1 and, in consequence, the network 
behaves in this region as a binary network. The dashed line signals the optimal 9, i.e., the value 
of the width parameter for which the categorization overlap mi reaches its maximum value. It is 
interesting to note that the present phase diagram is similar to that of ref. |Q], for the retrieval 
problem, with the categorization phase taking the role of the retrieval phase in that problem. 

12 



An important question addressed in this paper refers to the role played by the activity of the 
examples, o, on the categorization ability of the network. In Fig. |^ the categorization error e^, is 
shown as a function of the activity, for a = 0.02, s = 20, h = 0.2 and for several values of 0. The 
results reveal that Ec is a monotonically increasing function of a. Since this is the general behavior 
for other values of the parameters, the results confirm that for the connected, as well as for the 



dilute |19| networks, it is better to train the network with low-activity examples. This can be 
understood noting that the activity a of the examples is decreased when a macroscopic number of 
bits of every example is turned off. But in keeping the overlap h between examples and concepts 
fixed, the bits that are turned off in the examples must be those that are inverted with respect to 
the concepts . When the activity a reaches its minimal, optimal value, a = b, the only bits that 
are turned on in the examples are those that are aligned with the concepts, leading to the smallest 
categorization error. In this case the categorization task of the network becomes similar to the 
reconstruction of a puzzle from loose pieces. Finally, the figure also shows the discontinuous jump 
to the spin-glass (SG) phase, at the upper phase boundary of Fig. ||. 

The categorization error as a function of the number of examples s, for b = 0.4, ^ = and 
6 = 1.0 and two different activities, namely a = b (all wrong bits in the examples are turned off) 
and a = 1.0 (all wrong bits in the examples are included) is shown in Fig. S. Starting from the 
spin glass phase, with categorization error equal to 0.5, the network undergoes a discontinuous 
transition to the categorization phase as the number of examples increases above a critical value. 
The number of examples required for the jump to the categorization phase is considerable smaller 
for a = 0.4, than for a = 1.0. Nevertheless, the final categorization error is similar for the two 
activities. This means that the network is able to overcome a higher amount of errors in the 
examples by a larger number of these examples. A higher value of s is also required for a higher 
threshold 9, in order to reach a higher local field to attain the states with non-zero activity. 
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B. Categorization properties in the presence of synaptic noise 

We consider next the categorization performance obtained from 5^(/is, 9'), Eq. (^), for finite p. 
Fig. ^ illustrates the influence of the temperature on the categorization error, for a = 0.01, s = 20, 
b = 0.2 = 9, and activity a equal to 0.2 and 0.3. To the left of the arrow in the curve corresponding 
to a = 0.2, the categorization phase is the global minimum, while it is a local minimum to the 
right. In what concerns the present set of parameters, the categorization phase for a = 0.3 is a local 
minimum for all temperatures, whereas the spin-glass phase is the global minimum. Thus, it is also 
advantageous for an enhancement of the performance of the network, in the presence of synaptic 
noise, to train the network with examples of low activity. Fig. ^ also shows the discontinuous 
transition to the spin-glass phase at an activity-dependent transition temperature. 

The phase diagram for a vs. T is presented in Fig. |5| for 9 = 0.2, s = 20 and a = 0.2 = b. The 
categorization phase is stable below the upper phase boundary, where it disappears discontinuously, 
becoming a global minimum below the lower phase boundary. At very small a and T there is a 
retrieval phase without categorization, not shown in the figure. The dashed line on the left is the 
locus of the AT-line. The replica-symmetric solution for the categorization phase becomes unstable 
to replica-symmetry-breaking fluctuations at the left of this line. The re-entrant behavior of the 
upper phase boundary at low T is associated to the instability of the replica-symmetric solution in 
this region. The spin-glass phase becomes a global minimum to the right of the heavy solid, and 
to the left of the dotted line, where it disappears continuously. At the right of the dotted line, the 
paramagnetic phase is the global minimum. 

V. NETWORK WITH CONTINUOUS NEURONS 

In this section we discuss the categorization properties of a network with continuous, monotonic 
neurons trained with continuous or discrete examples of binary concepts. The continuous limit 
is obtained by taking Q — > oo in Eqs. (|l|) and (15). The following results are independent of 



the specific form of P(Af ), provided that its mean and variance are given by Eqs. (pi) and 
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respectively. The general Eqs. (pg|)-(|4^), for the saddle-points, apply also to this case. In the 
absence of noise, the effective transfer function, Eq. (44), becomes the stepwise linear function 



5oo {h, e') = sign — min 



26' 



29' 



,1, , 



in which 1/20' is the effective gain parameter. Consequently we obtain 



nil 



srugb 
29' 

_1_ rT_ 

'29'V2^ 



erf(M_) + - 1 + ^ 



erf(M+) 



exp (-mA - exp i-Ml\ 



(54) 



(55) 



1 

+ 2 



4e'2^ 

V 



1 + 



M+ exp (-M^ ) - M_ exp (-M|) 



26''2 \ 2 ^ 



2v 



[erf(M_)-erf (M4 



and 



C = — [erf(M+)-erf(M_ 



49 



where 



M± 



sm.,b ± 29' 
2^ 



(56) 



(57) 



(58) 



The zero-temperature phase diagram for a vs. ^ with s = 20 and a = 0.2 = 6 is shown in Fig. g. 
The categorization phase exists below the light solid line, and it is the global minimum below the 
heavy solid line. At the left of the dotted line, 9' is zero and the effective gain is infinite. In this 
region, the states Sj = ±1 are the only accessible states for non-zero local field and the network 
behaves as a binary network. In there, the critical a for categorization assumes its value in the 
binary network for this set of parameters, i. e., Oc^binary ~ 0.033. When the network enters the 
multi-state, continuous regime, the categorization capacity starts to increase abruptly, and reaches 
its maximum value ac ~ 0.047 for 6 ~ 0.11. The dashed line signals the optimum 9 for each a. It 
is worth noting that for a < Oic,binary the optimal 9 line coincides with the transition to the binary 
regime. This means that whenever there is a binary network capable to perform the categorization 
task, it will give the best categorization properties for low 9. Only when a > ac^hinary the network 
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with continuous neurons is expected to have a better performance. Contrary to the case of finite 
Q, where at zero temperature the rephca-symmetric solution is always unstable, there is here a 
region where it is stable, and this is the part of the phase diagram below the light dash-dotted 
line. The phase diagram illustrates that also the network of continuous neurons is robust to low 
gain in the states. The existence of a replica-symmetric stable phase at zero temperature was 
noticed in ref. ||8[, for the retrieval problem in a network of continuous neurons. Finally, the heavy 
dash-dotted line represents the onset of the continuous spin glass transition. 

In Fig. 1^ we present the categorization error as a function of the activity of examples, for 
a = 0.02, s = 20, and Q = 0.2 to 0.4. Since we deal with a non specified P(Af ), the only restriction 
imposed is o > 6^. We note from the figure that the categorization error is no longer a monotonic 
increasing function of the activity for all values of 6. For 6 = 0.4, Ec is a decreasing function of a, 
for small a. The reason is that in the case of large threshold 6, the local field hi{{Si\) must be 
sufficiently high to overcome the threshold, and this is obtained through a moderate increase in 
the activity of the examples. 

Finally, we discuss the influence of the number of examples s on the categorization ability 
of networks with continuous neurons. Fig. 1^ shows the categorization error as a function of s for 
a = 0.2 = 5, threshold ranging from 0.2 to 0.4 and a = 0.02. As a result of the continuous nature of 
the units, for low threshold the categorization error decreases smoothly with the increasing number 
of examples. This is distinct to the previous case of discrete units, where an abrupt decrease in Ec 
was observed even at ^ = (see Fig. ||). Furthermore, the decreasing in Ec is no longer monotonic 
for all values of the threshold. For example, for 9 = 0.2 there is a local maximum in Ec for s ~ 30. 

VI. SUMMARY AND CONCLUDING REMARKS 

The categorization problem, that consists of the recognition of ancestors, when a network is 
trained only with their descendents, is studied in this work for multi-state fully connected neural 
network models, keeping in mind an application to either artificial or biological networks in which 
the training is with sparsely coded patterns. Indeed, multi-state networks offer the possibility of 
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recognizing full-sized patterns in networks trained with "small" patterns, in which a macroscopic 
number of bits have been reduced or, eventually, set to zero reducing thereby the activity of the 
encoded patterns. We found that a low activity can enhance the categorization ability of a fully 
connected network in a significant way, by changing the threshold for firing of the units. This 



confirms and extends earlier results on an extremely dilute network of Q = 3-state neurons |18|. 

The way the network works for the categorization task is the following. After training with 
correlated examples, the network searches for stable symmetric mixtures states, in place of pure 
examples. If these patterns have low activity, it will be less likely that they have bits with opposite 
sign to the corresponding concepts. The recognition of the latter from the common features of the 
examples will thereby be enhanced. 

We derived formal expressions, within replica-symmetric mean-field theory, for the free energy 
and the relevant order parameters for the categorization problem in a fully connected neural net- 
work model, with units in general Q Ising-states and multi-state patterns belonging to a two-level 
hierarchy. Training of the network was assumed to take place through a generalized Hebbian learn- 
ing rule involving only the descendents. These may be considered as corrupted examples of the 
ancestors (concepts) with a number of turned off or inverted bits. Explicit results for the relevant 
phase diagrams and the categorization curves were then obtained for a. Q = 3-state model with a 
monotonic activation function and for a monotonic Q = cxo-state model. In the first case we also 
checked the robustness of the network performance to synaptic noise. Our results are restricted to 
binary ancestors and multi-state descendents, although the case of multi-state ancestors has been 



considered in an extremely dilute network |21|. 



The limit of validity of the replica-symmetric solution was established in this work looking for 
the Almeida-Thouless lines. For Q = 3, the replica-symmetric solution is unstable in the absence 
of synaptic noise (T = 0) and there is a re-entrant behavior for the ratio a of recognized concepts, 
at small synaptic noise, in accordance with earlier results on the retrieval problem [^,35] and on 



the categorization problem in connected networks of binary neurons [^. Nevertheless, since the 
replica-symmetric solution stabilizes at very small T, we argue that replica-symmetry breaking 
effects should be negligible, even at T = 0. On the other hand, there is a finite region of interest 
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for the categorization performance domain where the rephca-symmetric solution is stable, even at 
T = 0, in the case of the Q = cx)-state network, as demonstrated explicitly in this work. 

To summarize, we succeeded in studying a fully connected multi-state neural network model 
for the categorization problem of recognizing binary concepts when the network is trained with 
Q-state examples of low activity, in place of the full activity patterns of a binary network of states 
S* = ±1. The work presented here can be extended in various directions. First, to infer multi- 
state concepts in a network with full connectivity and to study the categorization performance for 
sparsely coded sequential examples. In order to come closer to biological networks, it would be 
interesting to consider the partial dilution of synapses. 
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FIGURES 
FIG. 1. Phase diagram for the ratio a of recognized concepts as a function of the threshold 

0, for the three-state network. The number of examples is s = 20, the activity a = 0.2 = h (the 

correlation parameter). Here, C, SG and P are the categorization, spin-glass and paramagnetic 

phases, respectively. Below the heavy solid line, the categorization phase is the absolute minimum 

of the free-energy. Solid (dash-dotted) lines indicate a discontinuous (continuous) transition. The 

dashed line indicates the optimal value of Q. At the left of the dotted line, the network behaves as 

a binary network with states Si = ±1. 

FIG. 2. Categorization error as a function of the activity a, for the three-state network at 
T = 0, when a = 0.02, s = 20, 6=0.2 and 6 = 0.0-0.3 (as indicated). 

FIG. 3. Categorization error as a function of the number of examples s, for the three-state 
network at T = 0, when a = 0.05, b = 0.4. Solid (dotted) lines correspond to ^ = 0.0 (9 = 1.0) 
and the two lines at the left (right) correspond to a = 0.4 (a = 1.0). 

FIG. 4. Categorization error as a function of the temperature T, for the three-state network, 
when a = 0.01, s = 20, 6 = 0.2 = i9 and a = 0.2 (solid line) and 0.3 (dotted line). 

FIG. 5. Categorization phase diagram of a vs. T, for the three-state network, when 6 = 0.2, 
s = 20 and a = 0.2 = 6. Below the heavy solid line the categorization phase is the absolute 
minimum of the free energy. Solid (dotted) lines indicate a discontinuous (continuous) transition. 
The replica symmetry is broken at the left of the dashed line. 
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FIG. 6. Categorization phase diagram of a vs. 6, for the continuous Q = oo-state network, 
at T = 0, when s = 20, a = 0.2 = b. Here, C, SG and P are the categorization, spin-glass 
and paramagnetic phases, respectively. Below the heavy solid line the categorization phase is 
the absolute minimum of the free-energy. Solid (heavy dash-dotted) lines indicate discontinuous 
(continuous) transitions. The dashed line indicates the optimal value of 0. At the left of the dotted 
line, the network behaves as a binary network with states Si = ±1. Below the light dash-dotted 
line the replica symmetric solution is stable. 

FIG. 7. Categorization error as a function of the activity a, for the Q = oo-state network at 
T = 0, when a = 0.02, s = 20 and 6=0.2. The threshold values are 0.2, 0.3 and 0.4 (curves from 
right to left). 

FIG. 8. Categorization error as a function of the number of examples s, for the Q = oo-state 
network at T = 0, when a = 0.02 and a = 0.2 = b. The threshold values are 0.2, 0.3 and 0.4 
(curves from left to right). 
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